Discovering Knowledge through Multi-modal Association Rule Mining for Document Image Analysis

نویسندگان

  • Michelangelo Ceci
  • Corrado Loglisci
  • Lynn Rudd
  • Donato Malerba
چکیده

The paper introduces a descriptive data mining method to discover knowledge for the task of automatic categorization in document image analysis. We argue that a document image is a multi-modal unit of analysis whose semantics is deduced from a combination of textual content, layout structure and logical structure. So, the method considers simultaneously different modalities of document representation, and, therefore different types of information: spatial information derived from a complex document image analysis process (layout analysis), information extracted from the logical structure of the document (by means of document image classification and understanding) and the textual information extracted by means of an OCR. The proposed method is based on a relational data mining approach to discover association rules, where the relational setting is justified, given its appropriateness to analyze data available in more than one modality. Experimental results on a real world dataset are reported.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Method for Selecting the Supplier Based on Association Rule Mining

One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...

متن کامل

A New Model for Discovering XML Association Rules from XML Documents

The inherent flexibilities of XML in both structure and semantics makes mining from XML data a complex task with more challenges compared to traditional association rule mining in relational databases. In this paper, we propose a new model for the effective extraction of generalized association rules form a XML document collection. We directly use frequent subtree mining techniques in the disco...

متن کامل

Multi-level Association Rule Mining: an Object-oriented Approach Based on Dynamic Hierarchies

Previous studies in data mining have yielded e cient algorithms for discovering association rules. But it is well-known problem that the two controlling measures of support and con dence, when used as the sole de nition of relevant association rules, are too inclusive | interesting rules are included with many uninteresting cases. A typical approach to this problem is to augment the thresholds ...

متن کامل

A Survey on Infrequent Weighted Itemset Mining Approaches

Association Rule Mining (ARM) is one of the most popular data mining technique. All existing work is based on frequent itemset. Frequent itemset find application in number of real-life contexts e.g., market basket analysis, medical image processing, biological data analysis. In recent years, the attention of researchers has been focused on infrequent itemset mining. This paper tackles the issue...

متن کامل

Mining Association Rules from Unstructured Documents

This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015